Using BM25F and KLD for Pattern Retrieval

نویسندگان

  • Joaquín Pérez-Iglesias
  • Álvaro Rodrigo
  • Víctor Fresno-Fernández
چکیده

We describe in this paper our system for the Prior-art task of CLEFIP 2010 (a task focused on the retrieval of relevant patents to a given one) and its results. We have developed a system where patents are indexed by fields in order to allow a selection of the most discriminative terms of each field, applying Kullback-Leibler divergence as feature selection method, and using different boost factors for each field applying BM25F as ranking function. Although CLEF-IP has been proposed in a multilingual scenario, we have approached it from a monolingual perspective. The results are on the average of last year’s results, what encourages us to continue the development of this system by including some kind of multi-lingual processing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating the Probabilistic Models BM25/BM25F into Lucene

This document describes the BM25 and BM25F implementation using the Lucene Java Framework. The implementation described here can be downloaded from [Pérez-Iglesias 08a]. Both models have stood out at TREC by their performance and are considered as stateof-the-art in the IR community. BM25 is applied to retrieval on plain text documents, that is for documents that do not contain fields, while BM...

متن کامل

Rich Speech Retrieval Using Query Word Filter

Rich Speech Retrieval performance improves when general query-language words are filtered and both speech recognition transcripts and metadata are indexed via BM25F(ields).

متن کامل

Content Based Radiographic Images Indexing and Retrieval Using Pattern Orientation Histogram

Introduction: Content Based Image Retrieval (CBIR) is a method of image searching and retrieval in a  database. In medical applications, CBIR is a tool used by physicians to compare the previous and current  medical images associated with patients pathological conditions. As the volume of pictorial information  stored in medical image databases is in progress, efficient image indexing and retri...

متن کامل

A Practitioner's Guide for Static Index Pruning

We compare the termand document-centric static index pruning approaches as described in the literature and investigate their sensitivity to the scoring functions employed during the pruning and actual retrieval stages. 1 Static Inverted Index Pruning Static index pruning permanently removes some information from the index, for the purposes of utilizing the disk space and improving query process...

متن کامل

LaHC at CLEF 2015 SBS Lab

This paper describes the work of the LaHC lab of SaintÉtienne for the Social Book Search lab at CLEF 2015. Our goals were i) to study a field-based retrieval model (BM25F), exploiting various topics and documents fields, in order to build a strong baseline for further experiments, ii) to compare it with a Log logistic (LGD) retrieval model, and iii) to exploit some documents related to each top...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010